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ABSTRACT 

Context. Detecting cosmic ray hits (cosmics) in fiber-fed integral-field spectroscopy (IFS) data of single exposures is a challenging 
task because of the complex signal recorded by IFS instruments. Existing detection algorithms are commonly found to be unreliable 
in the case of IFS data, and the optimal parameter settings are usually unknown apriori for a given dataset. 

Aims. The Calar Alto legacy integral field area (CALIFA) survey generates hundreds of IFS datasets for which a reliable and robust 
detection algorithm for cosmics is required as an important part of the fully automatic CALIFA data reduction pipeline. Such a new 
algorithm needs to be tested against the performance of the commonly used algorithms L . A . Cosmic and DCR. General recommenda- 
tions for the usage and optimal parameter settings of each algorithm have not yet been systematically studied for fiber-fed IFS datasets 
to guide users in their choice. 

Methods. We developed a novel algorithm, PyCosmic, which combines the edge-detection algorithm of L .A. Cosmic with a point- 
spread function convolution scheme. We generated mock data to compute the efficiency of different algorithms for a wide range 
of characteristic fiber-fed IFS datasets using the Potsdam Multi- Aperture Spectrophotometer (PMAS) and the Visible MultiObject 
Spectrograph (VIMOS) IFS instruments as representative cases. 

Results. PyCosmic is the only algorithm that achieves an acceptable detection performance for CALIFA data. We find that PyCosmic 
is the most robust tool with a detection rate of > 90% and a false detection rate < 5% for any of the tested IFS data. It has one less 
free parameter than the L . A . Cosmic algorithm. Only for strongly undersampled IFS data does L . A . Cosmic exceed the performance 
of PyCosmic by a few per cent. DCR never reaches the efficiency of the other two algorithms and should only be used if computational 
speed is a concern. Thus, PyCosmic appears to be the most versatile cosmics detection algorithm for IFS data. It is implemented in 
the new CALIFA data reduction pipeline as well as in recent versions of the multi-instrument IFS pipeline P3D. Although PyCosmic 
has been optimized for IFS data, we have also successfully applied it to longslit data and anticipate that good results will be achieved 
with imaging data. 

Key words. Techniques: image processing - Methods: miscellaneous 



1. Introduction 

The identification and rejection of artefacts on charged-couple 
device (CCD) detectors caused by cosmic ray hits (hereafter cos- 
mics) is a persisting problem for the reduction and analysis of 
astronomical data. Combining multiple images of the same ob- 
ject or field is considered the best method to identify cosmics 
because it is less likely that the same pixel is affected in sev- 
eral images. Sophisticated algorithms that detect and reject out- 
lier pixels during the combination of exposures were developed, 
for example , for the Hubble Space Telescope (e.g., Fruch ter"&| 
Hook 1997). However, there are often cases where only a sin- 
gle exposure is available or multiple exposures cannot be com- 
bined. This happens frequently with fiber-fed integral field spec- 
troscopic (IFS) data, where the effects of differential atmospheric 
refraction, instrument flexure, or a variable sky brightness during 
the sequence of exposures prevent reliable detection of cosmics 
by image comparison. 

Various techniques have been developed to detect and re- 
ject cosmics in single CCD exposures. They use different meth- 
ods like trained neural networks ( Salzber g et al.|19 95 ), convolu- 



* PyCosmic is freely available as a Python-based stand-alone pro- 
gram at http : //pycosmic . sf . net for download. 



tion with a poin t- spread function (P SF, |Rhoads|2000| ), Laplacian 
edge detection ( van Dokkum 2001 1 hereafter D01), image statis- 
tics (|Pych| |2004| hereafter P04), or a fuzzy logic approach 
( Shamir 2005 ). A detailed performance evaluation of the differ- 
ent a lgorithm s on single a stronomical images was presented by 
|Farage & Pimbblet|p505] ). Their tests revealed that the D01 al- 
gorithm, also known as L. A. Cosmic, performed well on imag- 
ing data. The algorithm of P04, known as DCR, did not perform 
as well on images, but was much less computationally expensive 
and primarily designed for spectroscopic data. 

Currently, a thorough evaluation of the performance of cos- 
mics detection algorithms for fiber-fed IFS data is missing. 
Signals in such data are much more complex because a spec- 
trum from each individual fiber is recorded along a discrete trace 
on the CCD, with little gaps between spectra. Thus, edge-like 
structures are introduced, and bright object or night- sky emis- 
sion lines are more likely to be misclassified as cosmics, which 
is why automatic data-reduction pipelines generally avoid in- 
cluding this crucial step in the reduction process (e.g., |Barnsley| 
et al. |2012| ). Sophisticated methods to detect cosmics in data 
from fiber-fed multi-object spectrographs were presented ( [Zhu| 
et al.||20Q9| |Wang et a l. 2009 ) and show excellent results, but 
their parameter choices seem arbitrary for the L . A . Cosmic and 



1 



B. Husemann et al.: PyCosmic: detecting cosmics in fiber-fed IFS datasets 



DCR algorithms as their prime reference. Additionally, there is 
no public code available to make an independent check of their 
results and to verify whether the algorithm works also with IFS 
data. 

For the Calar Alto legacy integral field area (CALIFA) sur- 
vey ( Sanchez et al.|2012|) and other IF S studies using the same 



instrument (e.g., |Sandin et aL 2008), it was discovered that 



the available algorithms always selected night- sky or object- 
emission lines as cosmics for the IFS data. An initial attempt 
to reduce the high false detection rate for CALIFA data by us- 
ing a simplified Laplacian edge detection algorithm was imple- 
mented into the R 3D reduction package ( Sanchez 2006 ) and was 
only partly successful. Although it reduced the number of false 
detections, a significant number of cosmics were undetected. 

In this paper, we present a novel algorithm called PyCosmic. 
It combines the iterative Laplacian edge detection scheme with 
a PSF convolution approach. We evaluate the performance of 
PyCosmic against the most popular algorithms available, DCR 
and L. A. Cosmic, on realistic mock data for different IFSs and 
compare the results with illustrative examples on observed raw 
data. We then provide general recommendations regarding the 
use of detection algorithms with fiber-fed IFS data. 

The different algorithms used in this study are briefly de- 
scribed in Section|2] Results of our detailed performance and pa- 
rameter study on IFS mock data are then presented in Section [3] 
followed by results obtained for real data in Section [4] Finally, 
we provide general recommendations as a guide for other IFS 
users in Section [3 

2. Outline of different cosmics detection algorithms 

In the following, we briefly describe the three algorithms used in 
our comparative study, including our novel PyCosmic algorithm, 
to understand their basic differences. 



2.1. DCR, count statistics on subframes 

A simple and fast algorithm was presented by P04, which uses 
count statistics to detect cosmics as outliers in the histogram of 
pixel counts. To do this, the image / is first split into small over- 
lapping subframes // that are treated separately. These subframes 
are intentionally kept small, <100 pixels, to consider only a local 
distribution of counts. Thereafter, the mode </,•) and standard de- 
viation ctj. are calculated for all pixels in a subframe. To remove 
the influence of high-value pixels, (/,•) and <x; are calculated a 
second time, this time only including pixel values m that satisfy 
(It) - %(Ti < m < (Ii) + £<x;, where £ is an arbitrary threshold 
value. 

The subsequent steps are: (i) construct a histogram h(Ji) us- 
ing all pixel values m, (ii) search for the first empty histogram 
bin with a value mo that is higher than </;), and (iii) find the first 
gap [mi, m{\ in the histogram with zero number counts that ful- 
fils {m2 - mi) > %CTi and m\ > mo. 

If such a gap exists, then all pixels in I t with a value higher 
than mi are masked as cosmics. Masked pixels (including neigh- 
bor pixels inside a so-called "growing radius" of one to two pix- 
els) are then replaced with the mean value of a set of nearby 
pixels. In most applications of DCR, the growing radius is set to 
one pixel to fully cover the boundaries of the cosmics, but we 
use a zero-growing radius to achieve a fair comparison with the 
results of other algorithms. Furthermore, to account for multiple 
pixel cosmics, the algorithm is run iteratively. There are three 
free parameters that have to be set: the shape of the subframes 
//, the threshold value and the number of iterations. 



2.2. L. A. Cosmic, The Laplacian edge detection approach 

D01 was the first to use the Laplacian operator for the detec- 
tion of cosmics in astronomical images. In its discrete form, the 
Laplacian operator can be written as 



0-10 
V 2 /= -1 4 -1 
0-10 



(1) 



Convolved images using this operator will highlight sharp edges 
because it removes a smooth signal and increases the contrast of 
isolated strong pixels. 

The algorithm starts by subsampling the bias- subtracted im- 
age / by a factor f s =2. This subsampling is required to avoid 
attenuation of cosmics by negative cross patterns when convolv- 
ing the image with the discrete Laplacian kernel 



12) 



(2) 



where 7 (2) is the subsampled image and is the convolution op- 
erator. All negative values in X (2) are set to zero before the image 
is resampled to its original size. We refer to the resulting image 
as£ + . 

Cosmics are identified in £ + with respect to the expected 
noise of each pixel. The noise properties of £ + and / are nearly 
equal for higher standard deviations (D01), which is why / can 
be used to estimate the noise (N), 



N=-(g(M 5 ®I) + criy 



(3) 



where g is the gain [e"ADU _1 ], <x rn the readout noise [e~], and 
M n annxn median filter (here n = 5 pixels). Deviations from the 
expected noise are calculated as 



S=£ + /(fsN) . 



(4) 



Signals of real objects remain in S because of Poisson noise 
and the pixel sampling of smooth intensity profiles (cf. Fig. [I]). 
This component, the sampling flux, can be significant if the sig- 
nal is high or if the PSF is poorly sampled. Extended structures, 
which are larger than about five pixels, can be removed from S 
using a 5 x 5 median filter; S ' - S - M5 S . The first criterion 
for detection of cosmics demands S' > <xii m , where a typical 
limiting value is <Tii m = 5. 

In addition, it will be difficult to distinguish cosmics from 
stars in a critically sampled image, i.e., close to Nyquist sam- 
pling, because they are very similar on small scales. Such point 
sources can, however, be distinguished from cosmics by their 
symmetry. An image T is calculated that contains only symmet- 
ric fine structure on scales of 2-3 pixels 



T = M 3 / - [M 7 (Af 3 /)] . 



(5) 



The second criterion states that the contrast between £ + and T 
is greater than a limiting value, X + / < F > f\ xm , where typical lim- 
iting values for images are f\ xm =2.0-5.0. 

Cosmics are finally identified as pixels that satisfy both crite- 
ria, although cosmics are mostly larger than a single pixel. While 
detection probability is higher for pixels on the edge of large 
multiple-pixel features, it may be negligible for pixels within the 
feature. Arbitrarily large cosmics can be fully detected by iter- 
atively applying the rejection process as described above. After 
each iteration, the newly identified cosmics are replaced with the 
median of nearby unmasked pixels. In total, there are four free 
parameters to set: the threshold value <Tii m , the limiting value 
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Fig. 1. Visual outline of the intermediate steps of our PyCosmic algorithm, which is based on L . A . Cosmic. 



/lim, the number of iterations, and a special parameter cr frac . This 
last undocumented parameter is a factor that is used to reduce 
the cr lim threshold of neighbor pixels within a growing radius to 
detect cosmics in them as well. By default, the growing radius is 
set to one pixel. By definition, the effective growing radius is set 
to zero when cr frac = 1.0. 



2.3. PyCosmic, combining edge detection with a PSF 
convolution approach 

Although L. A. Cosmic performs extremely well on imaging 
data, it is much less effective with fiber-fed IFS data. Spectra 
of several hundred fibers are dispersed along one axis of the 
CCD and are closely packed on the other axis with small sep- 
aration. This introduces a highly asymmetric situation on small 
pixel scales, which is why neither of the L. A. Cosmic crite- 
ria are robust in this case. The longslit spectroscopy version of 
L. A. Cosmic invokes a model fit to the sky and object spectra. 
However, this scheme is difficult to apply to fiber-fed IFS data 
due to the comparatively inhomogeneous distribution of spec- 
tra on the CCD. It is practically impossible to fit thousands of 
profiles without introducing additional edges. 

Our novel approach replaces the second criterion of 
L . A . Cosmic to avoid the simple median smoothing. Instead, we 
take advantage of the smooth two-dimensional shape of the spec- 
trograph PSF in contrast to the highly asymmetric cosmics. This 
combines PSF-matched filtering (Rhoads 2000) and Laplacian 
edge detection of cosmics in a simple and effective way. 

To discriminate between real signal and cosmics, we first 
smooth the bias- subtracted image / by convolving it with a two- 
dimensional Gaussian kernel G(w) (where w is the full width at 
half maximum (FWHM) of the Gaussian in pixels) and divide / 
by the smoothed image, 



R : 



/ <g> G(w) 



(6) 



The idea is to increase the contrast of higher frequency signal 
compared to the object signal, so that w should be larger than 
the width of the cosmics and smaller than FWHM of the spec- 
trograph PSF (6). We note that a similar approach was indepen- 
dently used by |Conselice ( 2003| ) to define the dumpiness pa- 
rameter as a measure for the high-frequency components in the 
morphology of galaxies. The artificial smoothing of the data is a 
computationally easy task and the most natural choice to capture 
the high-frequency components in a signal. Cosmics with R » 1 
appear surrounded by a halo with values of R <?c 1. When R « 1, 
however, there is a homogeneous structure. We further increase 
the contrast of cosmics in R by convolving it with the Laplacian 
kernel after subsampling R by a factor of two, 

<R^ = V 2 f®R (2) . (7) 

Again, negative values in 7? (2) are set to 0, and the image is re- 
sampled to its original size; we refer to this result as < R + . 

We then replace the second criterion of L. A. Cosmic by 
^ + > Him, in order to minimize the false detection of real sig- 
nal in fiber-fed IFS data. Snapshots of intermediate images of 
the process steps are shown in Fig. [TJ which give a visual im- 
pression of the algorithm. 

Before the Laplacian convolution is applied to the image in 
each subsequent iteration, it is necessary to replace all the pix- 
els of the detected cosmics with "good" values. Yet, it is very 
difficult to restore the original information of these pixel in IFS 
data, particularly near bright emission lines. An artificial edge 
could be created that can cause the cosmics to expand into the 
unaffected signal of the line. However, it is straightforward to 
mask all pixels of already detected cosmics in the convolution 
operation / <S> G(w). In this way, we minimize the artificial ex- 
tension of cosmics into bright object data. PyCosmic does not 
employ the <Xf rac parameter and thus has one less free parameter 
than L. A. Cosmic. 

The threshold value rn m depends on two independent param- 
eters: the FWHM w of the Gaussian kernel G(w) and the FWHM 
of the instrumental PSF (0). We consider here that and w are 
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Fig. 2. Simulated images to estimate the minimum threshold 
value for rn m . Simulated thumbnail images and intermediate im- 
ages to compute <R + are shown for a pure emission line feature 
considering three different instrumental PSFs with 6 = 2.5, 2.0, 
and 1.5 pixels FWHM. The maximum pixel value is provided 
for each subimage. The maximum in <R + corresponds to the min- 
imum value for rn m to avoid misclassification as cosmics. 

both identical for the dispersion and the cross-dispersion direc- 
tions on the CCD. This is a realistic assumption for most IFS 
instruments, except when pixels are binned differently on the x- 
and the y-axes during CCD read-out. In that case, the axis with 
the smallest 6 value is used as reference to select ru m with respect 
to w of the round Gaussian kernel. To estimate a minimum r\[ m 
value for a given setup, we simulated and processed snapshot 
images of a single emission line for a grid of 6 and w values. 
In Fig. [2] we show the H + image for a few simulated cases with 
1.5 < 6 < 2.5 and w = 2. The maximum value in <R + defines 
the absolute minimum value of rn m to avoid misidentification of 
object signal as cosmics. The results from the parameter study 
with 1.0 < 6 < 4.2 and 1.0 < w < 4.5 are summarized in Fig. [3] 
which serves as a guideline for selecting an appropriate rn m for 
any dataset. However, these are idealized values in the sense that 
no noise, no underlying continuum, and no cross-talk between 
the different fibers have been taken into account. Hence, the op- 
timal rii m threshold for real data should be slightly higher. 

3. Performance tests of the detection algorithms 

3.1. Simulation of mock fiber-fed IFS data 

We prepared dedicated simulations to test the performance 
of PyCosmic against DCR and L. A. Cosmic. In order to ob- 
tain unbiased results from the simulations, it is important 
to ensure that the signal distribution and shapes of cosmics 
are as realistic as possible. As we mentioned earlier, the <x- 
clipping is unfeasible for the majority of IFS data. Instead, 
we use dark frames to extract our template cosmics masks 
for two telescopes/instrume nts: the Potsdam Multi- Aperture 
Spectrophotometer (PMAS, |Roth et al.] |2005| ) at the 3.5m 
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Fig. 3. Absolute minimum values of rn m as a function of and w. 
These values were determined from noise-free simulations and 
set a hard lower limit to avoid frequent false detections of ob- 
ject signal as cosmics. They are estimated from the instrument- 
specific value of w. The shaded area highlights the regime of 
significantly undersampled data, where more care is needed. 

Table 1. Detection quality of the best parameters for the simu- 
lated data 
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Calar Alto telescope and the Vis ible MultiObject Spectrograph 
(VIMOS, |Le Fevre eraL]|2003| ) at the Very Large Telescope. 
Dark frames are ideally suited for our purpose because they do 
not contain any signal, yet their long exposure times (~ 1 800 s) 
are comparable to those of typical science frames. We deter- 
mined the noise level in the dark frames and selected all outliers 
above a 5cr-threshold as cosmics. Given the low dark current and 
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read-out noise of the detectors, the 5<x limit corresponds to -30 
counts, the minimum signal of recovered cosmics. 

The PMAS instrument offers two integral-field units (IFUs): 
a lens array (LArr) with 16 x 16 len ses and a simple bundle 
of 382 fibers (PPak, |Kelz et al.|[2006]), wi th the latter used in 
the CALIFA survey ( [Sanchez et al.||2012| ). VIMOS is a ver- 
satile imaging and multi-object spectrograph, which also in- 
cludes a lens-array IFU. We simulated IFS raw data including 
the night-sky spectra as the main signal, which is understood 
well from existing observations for the three following IFU in- 
struments/setups: 

1) PMAS-LArr IFU with a R1200 grating backward (bw) 
mounted and a 2x2 CCD binning, leading to 6 ~ 1.5 pixels, 

2) PMAS-PPak IFU with a V500 grating and a 2x2 CCD bin- 
ning resulting in 6 ~ 2.4 pixels (CALIFA survey setup), 

3) VIMOS IFU with the mid resolution (MR) grism operating 
at 6 ~ 3 pixels. 

These IFU setups cover a wide range in data sampling from sig- 
nificantly undersampled to very well sampled data. We consider 
them representative of all fiber-fed IFUs that are currently in op- 
eration. Synthetic IFS data for each setup were produced using 
reduced fiberflats, traces, and dispersion masks from real obser- 
vations. The same observed night-sky spectrum, scaled to 1800 s 
effective exposure time, was used as input signal for all fibers. 
Afterwards, Possion and read-out noise were added to the simu- 
lated images, as well as empirical cosmics from the dark frames. 

3.2. Parameter study to reach optimal performance 

We did not include additional signal from astronomical objects, 
given that the simulated IFU data of the night sky already in- 
clude continuum and bright emission line features similar to the 
characteristics of astronomical objects. The goal here was to test 
the performance of the detection algorithms when the signal in 
each spectrum is dominated by the sky rather than the typically 
fainter object signal. 

In order to properly compare the performances of each algo- 
rithm, we tested them with a grid of input parameters because 
the optimal ones are unknown apriori for a given dataset. From 
the simulations, we defined the number of detectable cosmics 
N c , which were 5cr above the noise of the simulated image be- 
fore the cosmics were added. We defined the detection rate as 

= N^/N c , with the number of detected cosmics that match 
the input mask (N&). Pixels that were misclassified as cosmics 
by the algorithms are false detections (Nf), expressed as a false 
detection rate Pf = Nf/N c . We defined the detection efficiency as 
e = Nd/(N C + Nf), which has a value of e = 1 for ideal detection 
rates and < e < 1 when the detection was incomplete or the 
number of false detection was non-zero. 

Each of the algorithms has free parameters that need to be 
chosen by the user: 

a) DCR : subframe size of I i9 limiting sigma factor £ , the number 
of iterations, and growing radius (set to zero pixels), 

b) L. A. Cosmic: significance <xii m , threshold fn m , threshold 
(Tf rac , and the number of iterations 

c) PyCosmic: significance <xii m , threshold r\ im appropriate for 
the chosen value of w and the instrument specific value of 6, 
and the number of iterations. 

We consistently used a maximum number of six iterations in all 
cases for our tests. For L. A. Cosmic and PyCosmic, we set the 
significance level to crii m = 5 to achieve comparable results. The 



algorithm performance as a function of input parameters is sum- 
marized in Figs. [4][6] for the three IFS instruments. Surprisingly, 
we found that the achievable performance was strongly depen- 
dent on the IFS instrument characteristics. 

For the simulated VIMOS IFU data, which is representative 
of well-sampled raw data with ~ 3 pixels, the L. A. Cosmic 
and PyCosmic algorithms performed almost equally well at their 
best-parameter settings with a detection rate of P^ ~ 95%. 
PyCosmic achieved a slightly higher detection (Pj = 96.1%) 
with a marginally higher false detection rate (Pf = 3.5%) com- 
pared to L. A. Cosmic (P d = 94.5%, P f = 3.3%). The detection 
rate of P^ = 80% for DCR may not be sufficient for many appli- 
cations. 

Interestingly, L . A . Cosmic showed the poorest performance 
for PMAS-PPak IFU data critically sampled at < 2.4 pixels. 
The instrumental characteristics responsible for this substandard 
performance remain unclear. All parameter configurations gave 
Pf > 40%, which is unacceptable. PyCosmic was clearly the 
best algorithm, with a high detection rate and an accompanied 
low false detection rate (Pf < 1.5%). DCR performed as poorly as 
L . A . Cosmic with low detection and high false detection rates. 

Simulated data for the highly undersampled PMAS-LArr 
setup (0 ~ 1.5 pixels) are clearly domains of L. A. Cosmic, 
because it was initially optimized for strongly undersampled 
Wide-Field Plenetary Camera 2 Hubble Space Telescope im- 
ages. L. A. Cosmic reached a high detection rate at an accept- 
able false detection rate. Nevertheless, PyCosmic achieved a 
similar efficiency when the smoothing kernel width was set to 
w = 1.0 pixels, significantly smaller than the instrumental PSF. 
We expected DCR to perform poorly on undersampled IFU data 
because real signal and cosmics are hard to distinguish from 
count statistics. This is confirmed by our simulations. 

Table 1 summarizes the algorithm parameters together with 
their corresponding P d and Pf rates for each setup that gives 
the highest efficiency. For these "optimal" parameter settings, 
we made an additional statistical comparison to further eluci- 
date strengths and weaknesses of each algorithm. The results are 
shown in Fig. [7] L . A . Cosmic and PyCosmic behaved similarly 
well as Pd steeply rose to P^ = 100% above the threshold sig- 
nificance of 5<x. The DCR algorithm in all cases had a shallower 
curve and reached 100% efficiency at much higher significance 
than the other two. Concerning misclassified pixels, we found 
that PyCosmic tended to detect pixels with low counts as cos- 
mics, while L . A . Cosmic did not. 

4. Illustrative examples of cosmics detection for 
observed IFS data 

Although our simulations closely matched real observations, it is 
difficult to simulate realistic data, including object signal, given 
the complexity of IFS data. Thus, we similarly processed data 
from real observations for the different IFUs to check whether 
results from the simulations agreed well with those of observed 
data. The optimal parameters as inferred from the simulations 
are used for the different algorithms and are applied to a typical 
raw frame from the CALIFA survey taken with the PMAS-PPak 
IFU at 900 s exposure time, to 1100s frame of a low-redshift 
galaxy taken with the VIMOS IFU in MR mode, and to a 1800 s 
frame exposure of the center of a globular cluster taken with 
the PMAS-LArr IFU using the R1200(bw) setup. Representative 
subframes and corresponding cosmic masks recovered by the al- 
gorithms are shown in Fig. [3] for the different datasets. 

In general, results obtained for real data reflected the out- 
come of the mock data analysis. Cosmics of the selected sub- 
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Fig. 4. Performance and parameter study of the DCR, the L . A . Cosmic, and the PyCosmic algorithms applied to simulated VIMOS 
IFU data in the MR mode with 6 = 3 pixels. The first row of all panels compares the detection rate (P^) of cosmics, the second row 
compares the false detection rate (Pf), and the third row shows the combined efficiency (e). The primary parameter that controls 
the performance of the algorithms is varied along the abscissa of each column of panels, while a secondary parameter is shown as 
different symbols connected by dotted lines as assigned in the legend. 



frames are illustrative of the strengths and weaknesses of the 
algorithms. In case of PMAS-PPak data, we clearly see that DCR 
had problems detecting cosmics if the underlying signal was al- 
ready quite high, yet it had few false detections at the same time. 
The false detections were a huge problem for L . A . Cosmic, not 
only in the simulations, because bright emission lines in real data 
were also classified as cosmics. PyCosmic almost perfectly de- 
tected cosmics and was by far the best algorithm for the PMAS- 
PPak instrument. This confirmed the necessity to develop a new 
algorithm for the CALIFA survey. 



5. Guidelines for algorithm selection and optimal 
parameter settings 

Based on the performance of the individual algorithms on simu- 
lated and real data, we try to provide useful guidelines here for 
users that need to tackle the problem of cosmics detection during 
the reduction of IFS data. 

5.1. DCR 

The performance of the algorithm depends only weakly on the 
subframe size. We recommend a symmetric size of the order of 
15x15 pixels. The main parameter to be set properly is which 
should be £ ~ 3 to achieve the highest performance. An excep- 
tion to these recommendations are undersampled data, where the 
subframe size seems to be important. A much higher value of £ 



needs to be chosen in this case > 8) to achieve an accept- 
able false detection rate (see Fig. [6]). Because of the intrinsically 
lower detection rate compared to the other two algorithms, DCR 
should only be used in case the highest computational speed out- 
weighs all other concerns. 



5.2. L.A.Cosmic 



The results of L . A . Cosmic are mildly dependent on the growing 
radius. The middle columns of Figs. [4j|6] show the best results 
using a zero growing radius. In this case, <Xf rac = 1.0, which is 
our recommended value. The improvement that can be achieved 
with this parameter is relatively small when compared to models 
using a growing radius of one pixel and <Xf rac = 0.5. Additionally, 
Pd and Pf vary smoothly with fn m above a certain limit, but it 
is difficult to predict an optimal value of fn m for a given instru- 
ment. In general, a value of fn m > 5 is required even for well- 
sampled data. This is in contrast to the behavior and guidelines 
for imaging data given by D01. While L.A.Cosmic is able to 
reach an excellent performance for some IFS data, it is not a ro- 
bust algorithm because it fails to produce acceptable results for 
certain IFU instrument configurations (see Fig.[5|. L . A . Cosmic 
may be appropriate for undersampled IFS data, but it should not 
be applied to other IFS datasets without careful checking of the 
results. 
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5.3. PyCosmic 



5.4. Handling cosmics in IFS data reduction 



The width of the smoothing kernel should be set to w < 6; other- 
wise, the optimal performance of PyCosmic cannot be reached. 
This is most evident in undersampled data, where the maximum 
efficiency decreases substantially with increasing w (Fig. [6]). For 
any given combination of w and 6, r\ im determines the efficiency 
of the detection. In extreme cases, as with undersampled data, 
the tolerance in rn m to reach the optimal efficiency is small. 
However, comparing the theoretically derived minimum values 
of rn m (cf. Fig. [3} with the best values of the simulation, we 
consistently found that the optimal rn m threshold needs to be a 
factor of ~2 larger than estimated from Fig. [3] With these param- 
eter settings, PyCosmic provides the most robust detection effi- 
ciency for any IFS instrument configuration with an efficiency 
of 6 > 90%, well-defined parameter settings, and the possibility 
of reducing the number of false detections Pf to nearly zero. 



All algorithms attempt to restore the information of pixels 
that are affected by cosmics. Nevertheless, the restored signal 
should be considered unreliable given the signal structure in IFS 
data. Instead of the common practice of simply processing the 
"cleaned" image, we emphasize that bad pixels can be nicely 
handled during the spectra extraction process when an opt imal 
extra ction scheme is used (e.g., |Hor ne 1986; Sharp & Bir chall 
|2010| ). Given that optimal extraction algorithms always assume 
a certain shape of the signal on the CCD, it is easy to mask bad 
pixels and restore the signal at the cost of a higher associated 
variance. When too many pixels are affected by cosmics on the 
raw frame for a given spectral-resolution element, they should 
be flagged as bad elements that are propagated through the re- 
duction pipeline to the final data product. We consider this to be 
the best possible scheme to handle artefacts caused by cosmics 
in IFS data. 
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6. Conclusions 

In this paper we have presented a novel detection algorithm 
for cosmics in single exposures called PyCosmic. The algo- 
rithm combines Laplacian edge detection with a PSF convolu- 
tion approach. We systematically compared the performance of 
our new algorithm against other standard detection algorithms, 
DCR ( |Pych|20Q4| ) and L . A . Cosmic ( |van Dokkum|2001| ), on sim- 
ulated and real images from fiber-fed IFS instruments. With the 
aid of these detailed comparison tests, we provide general rec- 
ommendations for the use of these algorithms for the detection 
of cosmics in IFS data. 

We have found that DCR does not reach a detection efficiency 
equivalent to that of L.A.Cosmic and PyCosmic. Therefore, 
we cannot recommend its use for IFS data in general, ex- 
cept when computational speed is critical. The strength of the 
L.A.Cosmic algorithm is that it works best for undersampled 
IFS data. However, a significant drawback is that the minimum 
false detection rate achievable for a given IFS data is entirely set 
by the characteristics of the instrument and cannot be reduced by 
changing any parameter settings. This peculiarity is mo st evident 
for PMAS- PPak IFU data from the CALIFA survey ( [Sanchez 
|et al.|[20T2| ), where the false detection rate of L.A.Cosmic is 
Pf > 40%. Our PyCosmic algorithm reduces the false detection 
rate with different parameter settings and solves this problem ef- 
fectively. It has replaced the simplified R3D routine (based solely 
on a Laplacian edge detection scheme) in the reduction pipeline 
of the CALIFA survey. 

PyCosmic is the most robust detection algorithm for cosmics 
in fiber- fed IFS data. In combination with well-characterized op- 
timal parameter settings, it is well- suited for automatic usage for 
very large datasets. CALIFA is already a huge IFS survey by cur- 
rent standards that has significantly benefited from the develop- 
ment of PyCosmic. The next generation of IFS instruments like 
the Sydney-AAO Multi-object IFS (SAMI, [Croom et al.|[20T2 ) 
or the IFU project Mapping Nearby Galaxies at APO (MaNGA) 
is already being built or is planned to carry out even larger IFS 
surveys. These surveys will deliver IFS data for thousands of 
galaxies in the near future, which will certainly benefit from ro- 
bust data reduction algorithms such as PyCosmic. 

The PyCosmic algorithm has recently been implemented in 
the versatile multi-IFU reduction software P3D (San din et al.| 
2010) and is also available as a Python-based stand-alone pro- 
grarrQso that it can be easily used or even added to any existing 
IFS reduction pipeline. Although PyCosmic has been optimized 
for IFS data, we have also applied it successfully to longslit data 
and anticipate that good results will be achieved with imaging 
data. 
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Fig. 8. Comparison of cosmics detected in real observation taken 
with three different IFS instruments. Three representative sub- 
frames of the raw images (left column of thumbnail images) 
were chosen to allow a good comparison for the results of 
the DCR, the L.A.Cosmic, and the PyCosmic algorithms. Pixel 
masks of detected cosmics are shown in the three right panel 
columns. 
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